Improving performance of fuzzy string matching against a dictionary [closed]
Posted
by
Nathan Harmston
on Programmers
See other posts from Programmers
or by Nathan Harmston
Published on 2011-02-09T13:47:55Z
Indexed on
2011/02/09
15:33 UTC
Read the original article
Hit count: 484
Hi,
So I'm currently working for with using SecondString for fuzzy string matching, where I have a large dictionary to compare to (with each entry in the dictionary has an associated non-unique identifier). I am currently using a hashMap to store this dictionary.
When I want to do fuzzy string matching, I first check to see if the string is in the hashMap and then I iterate through all of the other potential keys, calculating the string similarity and storing the k,v pair/s with the highest similarity. Depending on which dictionary I am using this can take a long time ( 12330 - 1800035 entries ). Is there any way to speed this up or make it faster? I am currently writing a memoization function/table as a way of speeding this up, but can anyone else think of a better way to improve the speed of this? Maybe a different structure or something else I'm missing.
Many thanks in advance,
Nathan
© Programmers or respective owner